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Summary 

Understanding infectious disease dynamics and the effect on prevalence and incidence is crucial for public 
health policies. Disease incidence and prevalence are typically not observed directly and increasingly are 
estimated through the synthesis of indirect information from multiple data sources. We demonstrate how 
an evidence synthesis approach to the estimation of human immunodeficiency virus (HIV) prevalence in 
England and Wales can be extended to infer the underlying HIV incidence. Diverse time series of data can 
be used to obtain yearly "snapshots" (with associated uncertainty) of the proportion of the population in 
4 compartments: not at risk, susceptible, HIV positive but undiagnosed, and diagnosed HIV positive. A 
multistate model for the infection and diagnosis processes is then formulated by expressing the changes 
in these proportions by a system of differential equations. By parameterizing incidence in terms of preva- 
lence and contact rates, HIV transmission is further modeled. Use of additional data or prior information 
on demographics, risk behavior change and contact parameters allows simultaneous estimation of the 
transition rates, compartment prevalences, contact rates, and transmission probabilities. 
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1. Introduction 

There were an estimated 33 million people worldwide living with human immunodeficiency virus (HIV) 
in 2007 (UNAIDS and WHO, 2008). Despite huge advances in treatment and prevention, the epidemic is 
still having a large impact, with unacceptably high numbers of deaths and new infections. In the United 
Kingdom, it was estimated that 73 300 (68 800-78 500) adults aged 15-59 years were living with HIV in 
2007, of whom 28% (24-33%) were still unaware of their infection (Health Protection Agency, 2008). 
Moreover (Presanis and others, 2010), the prevalence of undiagnosed infection among adults aged 15^-4 
years in England and Wales has not decreased significantly since 2001, indicative of ongoing transmission 
in some risk groups. There is hence a clear need to quantify HIV incidence and diagnosis rates in order to 
plan, implement, and evaluate interventions to reduce transmission. 

HIV incidence is hard to measure directly. In the past, 2 approaches have been employed to understand 
the rate of new infection: estimation and simulation. The most common method for incidence estimation 
in the early years of the epidemic was back-calculation (Brookmeyer and Gail, 1994), using information 
on AIDS diagnoses and the incubation period. Although the method has developed over the years, to 
incorporate additional information and to cope with new challenges (e.g. De Angelis and others, 1998; 
Downs and others, 2000; Becker and others, 2003; Sweeting and others, 2005), other methods for esti- 
mating incidence have also been explored. Estimation of disease incidence from a series of cross-sectional 
prevalence surveys is widely established in epidemiology (Keiding, 1991). In the HIV literature, methods 
making use of the "prevalence = incidence x duration" relationship to estimate HIV incidence from either 
age- and time-stratified seroprevalence data (Ades and Medley, 1994) or a single cross-sectional survey of 
individuals (snapshot samples) tested for one or more markers of recent HIV infection (Brookmeyer and 
Quinn, 1995) have been developed. "Snapshot sampling" methods are increasingly being reconsidered as 
newly developed laboratory assays to test for immune responses soon after infection continue to appear 
(e.g. Balasubramanian and Lagakos, 2009; Sweeting and others, 2010). 

Simulation from mathematical models of the spread of infectious diseases (Anderson and May, 1992) 
has been the other key tool in investigating HIV incidence. These dynamic transmission models are mul- 
tistate models where incidence depends on the size of the infected population, that is, prevalence. They 
are typically variations on a basic epidemic model, the susceptible-infected-removed (SIR) model (see, 
e.g. Anderson and Garnett, 2000; Becker and Marschner, 2001, for reviews in the HIV and sexually 
transmitted infection field). Forward simulation from fixed sets of parameters is performed to understand 
"qualitatively" the effects of various parameter values and/or interventions. Although in recent years, 
some attempts have been made to move toward a more inferential framework for simple deterministic 
HIV transmission models (e.g. Alkema and others, 2007), fully inferential approaches for more realistic 
deterministic transmission models are still rare, particularly for HIV (Punyacharoensin and others, 2010 
unpublished data). Importantly, there is still a gap between the epidemic models used by biomathemati- 
cians to represent the evolution of an epidemic, and the inferential approaches statisticians employ to 
estimate specific aspects of an epidemic. This gap is driven both by a lack of detailed data on epidemics 
and the consequent differences in motivation between the 2 fields (Solomon and Isham, 2000; Matthews 
and Woolhouse, 2005). 

In this paper, we propose a Bayesian evidence synthesis drawing upon ideas of estimation of incidence 
from serial prevalence and extending this to inference for a deterministic dynamic transmission model. 
"Evidence synthesis" refers to a growing and broad class of statistical analyses that combine data from 
disparate data sources to produce an estimate of key quantities of interest (e.g. Eddy and others, 1992; 
Ades and Sutton, 2006; Jackson and others, 2008). Goubar and others (2008) proposed a formal evidence 
synthesis to estimate HIV prevalence in England and Wales in 2001, using a complex probabilistic model. 
This model, further developed, has been applied to a series of data sets from recent years to estimate 
the trends in prevalence, particularly of undiagnosed infection (Presanis and others, 2010). Here, we use 
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these estimates of serial prevalence to inform a multistate model of the processes of HIV infection and 
diagnosis, to estimate incidence and diagnosis rates. We employ an evidence synthesis framework to 
combine the prevalence data with information on demographics, behavior change, and diagnosis rates. 
We further develop the multistate model into a nonlinear deterministic dynamic transmission model by 
parameterizing incidence in terms of prevalence, contact rates, and the probability of transmission given 
an infectious contact. This is the first time a deterministic epidemic model for HIV has been implemented 
in a fully Bayesian framework with the explicit aim of drawing formal inferences simultaneously about 
recent prevalence, incidence, contact rates, and transmission probabilities, based on a synthesis of all 
available relevant evidence. 

We review the prevalence model in Section 2 before describing the multistate model for incidence in 
Section 3. Section 4 describes the methods we use to assess models. Results from the base incidence model 
are given in Section 5 before developing the transmission model in Section 6. We end with a discussion 
in Section 7. 

2. Prevalence model 

The prevalence model has been described in full elsewhere (Goubar and others, 2008; Presanis and others, 
2008; Presanis, 2010), but briefly, the aim was to estimate by region r and risk group g: the proportion 
of the population in each strata, p g y, HIV prevalence, n g y, and the proportion of HIV infections that 
are diagnosed, 8 g , r . We synthesized: data from behavioral surveys such as the National Survey of Sexual 
Attitudes and Lifestyles ("NATSAL" Johnson and others, 2001) on the proportions p g y data from un- 
linked anonymous seroprevalence surveys ("UA surveys" Public Health Laboratory Service and others, 
2002) on either k g r , S g>r or prevalence of undiagnosed infection, ii g y{\ — S gJ ); and finally data from 
the Survey of Prevalent HIV Infections Diagnosed ("SOPHID" McHenry and others, 2000) on the total 
number (T r T\ p„ r iz„ r S„ r ) and risk group composition l P£- r7t g- rS $- r \ 0 f diagnosed infections, where 

T r is the total population size for region r . Figure 1 is a schematic directed acyclic graph (DAG) showing 
how we assume the parameters generate the data. 

In Presanis and others (2010), the prevalence model is fitted simultaneously to each of K sets of 
data Y tk ,k e {1, . . . , K], one referring to each year from 2001 to 2007, so that K — 7. Hence, each 
parameter is indexed also by the series of timepoints t\, k e {1, . . . , K}, where is defined as the date 
31st December 2000 + k. The evidence synthesis is carried out in a Bayesian framework with vague 
uniform prior distributions for all basic parameters Pt k ,g,r, Kt k ,g,r, and St k ,g,r and Binomial or Poisson 
likelihoods (Section 1 of the supplementary material available at Biostatistics online). 

Restricting attention to one risk group, men who have sex with men (MSM) in England and Wales, 
we obtain for each timepoint tk the joint posterior distribution of the proportion of the total population T tk 
in each of 4 compartments: not at risk (i.e. not MSM), 1 — p(tk); susceptible to HIV, p(tk){l — n(tk)}; 
HIV positive but undiagnosed, p(tk)n{tk){l — <5(f/t)}; and HIV positive and diagnosed, p(tk)n(tk)S{tk) 
(Figure 1 of the supplementary material available at Biostatistics online). Denote by Tl(tk) — {p{tk), 
n (tic), S(tk)} the vector of prevalence parameters. 

3. A COMBINED INCIDENCE AND PREVALENCE MODEL 

The estimates from the prevalence model of the 4 compartment proportions (Table 1 of the supplemen- 
tary material available at Biostatistics online) may be used to infer HIV incidence and diagnosis rates by 
considering the continuous-time multistate model shown in Figure 2. We have 2 choices: we can consider 
the incidence model to be the second stage in a 2-stage process, estimating first serial prevalence, then 
"plugging in" the prevalence estimates to the incidence model; or we can formulate a joint model, combin- 
ing both into a single synthesis. This allows simultaneous estimation of transition rates and compartment 
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Fig. 1. Schematic DAG for the prevalence model. 
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Fig. 2. Markov multistate model describing the male population of England and Wales in terms of risk group (non- 
MSM E, MSM) and HIV infection states: Susceptible, Undiagnosed, and Diagnosed. 



prevalences. We concentrate here on the joint model — for an exposition of the 2-stage model, see Presanis 
(2010). 

Consider the total number of men in England and Wales aged 15-44 years, regardless of risk group. 
Let E denote the set of non-MSM and S U U U D the set of MSM, composed of Susceptible (S) + infected 
Undiagnosed (U) + Diagnosed (D) MSM (Figure 2). For each generic compartment C e {E, S, U, D], 
denote the number of men in the compartment at time t by C{t) and the proportion of men in the com- 
partment by c(f) = C(t)/T(t), where T(t) = E(t) + S(t) + U(t) + D{t) is the total number of 
men. Denote by c(f) = {e(f), s(t), u(t), d(t)} the vector of compartment proportions at time f, where 
e{t) - 1 - p{t),s{t) - p(t){l - 7t(t)}, u(t) - p(i)K(i){\ - 8(f)}, andaf(f) = p(t)n(t)8(t). 

Let «(f) be the rate of entry of new 15-year-old men into the system and yj the rate at which men 
move from E to S. i// is assumed constant over time. Denote by 1(f) the incidence of HIV and /t(f) 
the diagnosis rate. Let fi(t) — [fi c (t): c e [E, S,U, £>}}, y (t) — [y c (t): c e [E, S,U, D}}, and 
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e(f) = {e c j(t): c e {E, S, U, D}, j e {In,Out}} be the vectors of mortality rates, exit rates out of 
the population due to age and migration rates into/out of the population, respectively, where each element 
refers to a single compartment-specific rate. We assume all inward migration into the population occurs 
into the state E, for simplicity. Alternative migration assumptions have also been explored (Presanis, 
2010). Assume all rates except i// are piecewise constant with break points at the end of each year, so that, 
for example, 1(f) = X(tk-i), if fy-i ^ t < tk for each k e {2, . . . , K] and hence {Ife): k e {2, . . , , K}} 
is a vector of 6 parameters. Note that as we have information on the compartment prevalences only at the 
set [tk- k e {1, . . . , K}}, that is, on {c(tk): k e {1, . . . , K}} via the prevalence model, we concentrate 
on estimating the set of rates {6{tk)'- k e {2, , . . , K}}, where 8(tk) denotes the vector of rate param- 
eters at time tk, namely {a(fjt), y/, l(tk), >c(tk), flfa), y (tk), e(tk)}. As each rate is assumed piecewise 
constant, this gives estimates of the rates at any time t between ti and f/f. The prevalence parameters 
E[(f) = {p(t), it(t), S(t)} at any time t are defined in terms of c(t): 

p(f) = s(t) + u(t) + d(t), 
u(t) + d(t) 

7U(t)= , (3.1) 

w s(t) + u(t) + d(t) 

s(t)= d(t) 



u(t) + d(t) 

The dynamics of the multistate model of Figure 2 may be described by a system of differential equations 
in terms of the numbers in each compartment: 

d 

— E(t) = a(t) + £ £ ,in(0 - {he(J) + y E (t) + e B ,out(0 + v)E(t), 
at 

d 

—S(t) = wE(t) - U(f) + ns(t) + ys(t) + es,out(f)}S(0, 
at 

d 

— C7(0 = A(0S(0 - {Kit) + fi v (t) + 7u (t) + s u>0u t(t)}U(t), 
at 

d 

— D(t) = K (t)U(t) - {ju D (t) + y D (t) + £D,OutC0}D(0, 
at 

or in terms of the proportion in each compartment, e(t), s(t), u(t), and d(t) (Section 2 of the supple- 
mentary material available at Biostatistics online). We proceed using the system of equations in terms 
of proportions to ensure we work in continuous-state-space rather than a continuous approximation to 
a discrete-state-space. Our aim is to estimate simultaneously the prevalence (Il(tk)) and the incidence 
(d(tk)) parameters, given all the available information. 



3.1 Data on transition rates 

As well as the data used in the prevalence model (Section 2) informing Il(tk) or functions thereof, data are 
also available to inform the transition rates. Yearly demographic data on men aged 15^44 years in England 
and Wales are available from the Office for National Statistics ("ONS" Office for National Statistics, 
2008a, 2008b, 2008c). These data are the numbers: in the population, T(tk); entering the population due 
to ageing, that is, aged 14 years, turning 15 years, a(tk); leaving the population due to ageing, that is, 
aged 44 years , turning 45 years, y° 4?fe); of deaths, deaths^*)' an< ^ enterm g (3^ in or l eavm g 
(y® Q^Ofc)) the population through migration. The population sizes we assume refer to the time points 
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%, k e {1, . . . , K}. The numbers moving in and out of the population are assumed to be the number of 
transitions observed between t^-i and tk for k e {2, . . . , K}. 

Demographic data on MSM as a population are not currently available, so assumptions that suscepti- 
ble and undiagnosed MSM have the same demographic rates as non-MSM will be required (Section 3.2). 
Data are instead available on MSM who are Diagnosed with HIV, from SOPHID, an annual cross-sectional 
survey of individuals with diagnosed HIV infection accessing a health care facility (Table 2 of the supple- 
mentary material available at Biostatistics online). Finally, data are also available on the yearly number of 
new diagnoses among MSM (Table 2 of the supplementary material available at Biostatistics online) from 
the HIV/AIDS Patient register (Health Protection Agency, 2006). 



3.2 Model assumptions 

Details on how the transition rate data are assumed generated from the prevalence and incidence parame- 
ters are given in Section 2 of the supplementary material available at Biostatistics online. Inference pro- 
ceeds in a Bayesian setting: all parameters except yj, aitk), and e(tk) are given vague prior distributions, 
independently for each fy-: k e {2, . . . , K }. 

A schematic DAG of the combined prevalence and incidence model is shown in Figure 3. Note that 
the functional relationships hold at any time t , not just the observed time-points tk, k e {k e 1, . . . , K}. 
The proportions c(tk) at the end of each year tk, k e {2, , . . , K] (Figure 3(b)) are defined in terms of 
the transition rates 9{tk-\) during the year and the initial conditions for the system at the end of 2001, 
c(t\) (Figure 3(a)) via the system of differential equations. The prevalence model for MSM is shown in 
the lower two-thirds of the DAG: the proportion of men who are MSM p(tk), HIV prevalence 7t(tk), and 
proportion diagnosed S(tk) are defined in terms of the proportion in each compartment, c(tk), as shown in 
Figure 3 and (3.1). These prevalence parameters generate the prevalence data at the bottom of the DAG, 
whereas the transition rates 6{tk-\) generate the transition data in the top right-hand corner of the DAG. 

The combined model may result in a set of estimates of Il(tk) that may potentially (but not necessarily) 
differ from the prevalence model estimates (Table 1 of the supplementary material available at Biostatitics 
online). We would therefore expect the combined model to allow us to assess whether the data, priors, and 
model structure from the incidence model are consistent (Lu and Ades, 2006; Presanis and others, 2008) 
with those from the prevalence model. If they are consistent, we would not expect the 2 sets of estimates 
to differ substantially. If there are differences, we might conclude there is evidence of inconsistency. 
As the combined model entails simultaneous estimation of both the prevalence and the incidence model 
parameters, in the presence of inconsistency, we might expect the combined model estimates of H(tk) to 
be a compromise between those of the prevalence model and the constraints imposed by the incidence 
model data, priors, and structure. 



3.3 Inference 

The likelihood for the joint incidence and prevalence model is Lq x Lyi, where Lg denotes the likelihood 
from the transition rate data and Ln denotes the prevalence model likelihood (Section 2 of supplementary 
material available at Biostatistics online). Having defined our priors and likelihood, samples from the joint 
posterior distribution are obtained, using a standard adaptive Metropolis-Hastings algorithm in WinBUGS 
(Lunn and others, 2000). The system of differential equations is solved numerically at each Monte Carlo 
Markov chain iteration for the current set of parameter values using the Runge-Kutta algorithm in the 
WBDiff package (WBDiff, 2004) to provide values of the compartment proportions c(tk), k e {1, . . . , K] 
with which to calculate the likelihood. 
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Fig. 3. Schematic DAG of the combined prevalence and incidence models. Squares/rectangles denote nodes that 
are observed data and circles denote stochastic nodes. Double circles denote nodes with prior distributions, whether 
diffuse or informative. Solid lines denote distributional dependencies, whereas dashed lines denote functional rela- 
tionships. Panel (a) gives the initial state of the system at time t\ , while Panel (b) represents the system at subsequent 
timepoints tj, . . . , tg. Data informing the prevalence part of the model are shown at the bottom of the DAG, while 
data informing the transition rates in the combined model, namely the data of Section 3.1 and Table 2 of the supple- 
mentary material available at Biostatistics online, are shown in the top right-hand corner of Panel (b). 



Table 1. Number of data points, posterior mean (saturated) deviance, (saturated) deviance evaluated at 
posterior means, effective number of parameters, and DIC, by model 



Model 


n 


D 


D(8) 


PD 


DIC 


Prevalence 


126 


124 


1 


123 


248 


Combined base 


174 


177 


21 


156 


334 


Combined base: prevalence data 


126 


125 


16 


110 


235 


Combined transmission 


174 


176 


22 


153 


329 


Combined transmission: prevalence data 


126 


124 


16 


108 


231 



4. Model assessment 

To assess absolute goodness of fit, we compare the posterior mean deviance D — Eg\y{D(0)} — 
J D(0, y)p(0\y)d0 with the number of observations n (Dempster, 1997; Spiegelhalter and others, 2002). 
When the model is true and under standard regularity conditions, the mean of the sampling distribution 
of D is asymptotically equal to n. If D is much larger than n, a lack of fit of the model to the data is 
suggested. However, how near D needs to be from n to constitute goodness of fit is an open question. 

Further deviance summaries have been proposed for assessing Bayesian model fit. Denote by D(0), 
the deviance calculated at the posterior mean of the parameters 0 and define the effective number of 



A Bayesian transmission dynamic model for HIV 



673 



parameters to be pjj — D — D{9). po is a measure of model complexity. To compare models, we employ 
the deviance information criterion ("DIC" Spiegelhalter and others, 2002), defined, analogously to the 
Akaike information criterion (Akaike, 1973), as DIC = D(0) + 2pu = D + po, that is, a measure of 
model adequacy penalized by a measure of model complexity. 

5. Results 

Results from the combined model suggest an increase in incidence (95% posterior probability that A 2007 
A 2002), with small troughs in 2002 and 2005, together with a slightly increasing diagnosis rate over time 
(Figures 4(a) and (b)). An increasing trend in the risk group proportion p over time (Figure 4(c)) is also 
estimated, in contrast to the prevalence only model, where the trend is constant. The additional data on 
transitions, priors, and structure of the combined model clearly have a different impact than those of 
the prevalence model on the posterior distribution of the MSM population. In the combined model, the 
estimate of p is determined by all the transition rates (Figure 3) as well as by the NATSAL data, whereas in 
the prevalence model, it is influenced only by the NATSAL data of 2001, assumed to provide an unbiased 
estimate of p in all years 2001-2007. This assumption, perhaps implausible, is made to compensate a 
lack of more recent information on the MSM population size. For p to be estimated as increasing, the 
transition rate data and priors induce net inflows to the set of MSM, S U U U D, that imply a greater rate 
of increase of MSM than the rate of increase in the total population would suggest. The greater amount 
of data in the combined model, together with the constraints imposed by the differential equations, also 
results in tighter posterior distributions for p than the prevalence model (Figure 4(c)). 

The top half of Table 1 gives deviance summaries for the prevalence and combined models. Given the 
different amounts of data involved in the 2 models, the full DICs are clearly not comparable. However, 
we can compare the DIC of the prevalence model (248) to the DIC contributions of the prevalence data in 
the combined model (235, row 3 in the table). We find an equal fit to the prevalence data in both models 
(D = 125 in the combined model compared to 124 in the prevalence model) but a smaller effective 
number of parameters in the combined model and hence a smaller DIC. The smaller pr> (110 compared 
to 123) is due to the fact that in the combined model, the prevalence parameters p(t), n(t), and S(t) are 
functions of the compartment proportions c(t ) and hence are functions of the compartment proportions 
in 2001, c{t\). In contrast, the prevalence model has no shared parameters over time. The equal fit to 
the prevalence data in the prevalence and combined models suggests the differing estimates of trend in 
p in the 2 models do not manifest as a lack of fit to the NATSAL data. This is unsurprising given the 
uncertainty in the estimates of p in the prevalence model compared to the combined model (Figure 4(c)): 
the posterior distributions from the latter are contained within the range of the prevalence model posterior 
distributions. We therefore would not expect D and DIC to identify any lack of fit to the NATSAL data. 

6. Transmission modeling 

Explicitly modeling the disease transmission process allows us to understand the relationship between 
incidence and prevalence, and hence the potential effect of interventions on an epidemic. To do so, we need 
to examine the relationship between disease incidence and 3 key factors: the prevalence of disease, the 
contact structure in the population of interest, and the probability of disease transmission given a contact. 
This allows us to partition incidence by contact group, where these groups may be defined by behavioral 
risk or diagnosis status, for example. Prevention and diagnosis policies may then be focussed at groups at 
highest risk of transmission. A first step toward understanding the relationship is simply to estimate the 
ratio of incidence to prevalence: that is, express l(t) = p{t)n(t). The ratio /?(f) is the effective contact rate 
("ECR" hereafter), the average number of contacts per unit time a susceptible individual makes that are 
sufficient for transmission if the contact is with an infectious individual. In the HIV/STI context, factors 
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Fig. 4. Posterior distributions from the combined model of incidence (a) and diagnosis (b) rates, and posterior distribu- 
tions of proportion who are MSM (c), by model: prevalence (grey) and combined (black). Note that these distributions 
are plotted at the year-end break points only, but the rates are in fact piecewise constant. 



A Bayesian transmission dynamic model for HIV 



675 



determining "sufficiency" for transmission (i.e. affecting /?(?)) include factors affecting both contact rates 
and transmission given an infectious contact (Anderson and Garnett, 2000). fi(t) may therefore be further 
defined (Keeling and Rohani, 2007) as x(t)Pt' where /(f) is the average contact rate experienced by 
a susceptible individual and p T is the transmission probability conditional on contact with an infectious 
individual. Incorporation of information on either of the 2 components /(f) an d Pr allows identification 
of the other. 

Once incidence is parameterized in terms of prevalence, the multistate model described so far is a 
generalization of the classic SIR model, where the susceptibles are MSM in state S, the infectious state 
/ is split into 2 states, U and D, and the removed state R is the population in E (not at risk) plus the 
population outside of the system. 

We assume that 2 groups are mixing effectively with susceptible MSM, both undiagnosed and di- 
agnosed HIV positive MSM, so that incidence due to each group may be estimated. We parameterize 
1(f) = x(t)p Tu 7i(t){l — 8(t)} + x(t)p TD 7r(t)3(t), where /(f) is the average number of new partners a 
susceptible MSM has per year (so that "contact" is interpreted as a partnership), and p w and p tD are the 
transmission probabilities conditional on contact with U and D, respectively (Section 3 of supplemen- 
tary material available at Biostatistics online). The contact rate is piecewise constant, with break points at 
each year end fj, k e 2, . . . , K and is given a Gamma(16,4) prior independently for each year, reflecting 
estimates of /(f) from NATSAL. The transmission probabilities have the following prior distributions: 
p Tu ~ Uniform(0, 0.3) and p TD ~ Uniform(0, p tu ). This model allows for differing transmission prob- 
abilities for individuals in U and D since diagnosed individuals may be on treatment, with consequent 
lower probabilities of transmission given contact. The prior for p Tu reflects the wide range of estimates 
of this quantity in the literature (Baggaley, 2006). Denote this the combined "transmission model," as 
opposed to the combined "base model" of Section 3. This is the simplest possible model for how groups 
mix, assuming homogeneous mixing of susceptibles with the 2 classes of infectious MSM, that is, that 
the probability of choosing an infectious partner is proportional to the prevalence in the group. We start 
with this model as a proof of concept, paving the way for future development to more realistically model 
mixing and transmission in MSM. 

6.1 Results 

Posterior distributions by model for incidence, prevalence, ECRs, contact rate, and transmission probabil- 
ities are shown in Figure 5. Uncertainty is greater for the base model (red) than for the transmission model 
(blue) due to the greater constraints imposed by mechanistically modeling the transmission process and 
the informative prior distributions on the contact and transmission parameters. Note that the transmission 
model tends to smooth the temporal trends in incidence and prevalence slightly in comparison to the base 
model, and both smooth the trend in prevalence compared to the prevalence model. Both models suggest 
a slight increase in incidence over time, with the transmission model suggesting this increase is driven 
by both new infections due to the sustained prevalence of undiagnosed infection and by incidence due to 
diagnosed individuals (Figure 5). 

In terms of deviance (Table 1), we find very little difference in the absolute fit to the data (D = 176 in 
the transmission model compared to 177 in the base model). The effective number of parameters is slightly 
lower in the transmission (pd — 153) than the base model (po — 156), as the increase in parameters due 
to introducing x (0 an d Ptu is offset by a greater reduction due to parameterizing incidence as a function 
of x(t) and p X{] and to the fact that their priors are informative. The DIC is therefore also slightly lower 
(329 in the transmission model compared to 334 in the base model). 

The transmission probability p Tu is constrained by the definition of incidence and by p TD being a 
lower bound to lie between l(f)/(/(f);r(f)) and l(f)/(/(f);r(f)(l — (5(f))). As the contact rate /(f) has 
a fairly tight informative prior, giving a posterior distribution suggesting on average 4 new partners per 
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Fig. 5. Posterior distributions from the 2 combined models: base (grey), and transmission (black). 
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susceptible MSM per year, and as incidence, prevalence (both diagnosed and undiagnosed) and hence 
the ECRs are all well identified (Figure 5), the transmission probability conditional on an undiagnosed 
partnership has a tight posterior distribution relative to its Uniform(0,0.3) prior. We estimate that 5% (3— 
9%) of partnerships with undiagnosed MSM result in transmission. This estimate is in the lower range of 
per-partnership probability estimates reviewed by Baggaley (2006) and should be interpreted with caution, 
as the transmission model presented here uses a simplistic mixing assumption and does not account for 
behavior (e.g. condom use). 

7. Discussion 

We have estimated HIV incidence from serial prevalence estimates, together with demographic and be- 
havior change data, using a multistate model in a Bayesian evidence synthesis framework. We have, in 
addition, embedded a (deterministic) transmission dynamic model within this framework by parameter- 
izing incidence in terms of prevalence, contact rates, and transmission probabilities. Deterministic trans- 
mission dynamic models for HIV have historically been simulation-based, with any attempts to quantify 
the uncertainty in the resulting "estimates" focused largely on scenario-type sensitivity analyses. Only 
relatively recently has a need been recognized for models that are both formally fitted to data and for 
which the uncertainty in the data sources is fully accounted. Although some steps have been made in 
this direction, relatively few fully Bayesian deterministic transmission dynamic models have so far ap- 
peared, even outside the HIV literature (Cancre and others, 2000). Alkema and others (2007) describe a 
susceptible-infected model for HIV implemented in a "Bayesian melding" framework, designed to make 
short-term projections of HIV prevalence using a sampling importance resampling algorithm. The authors 
also estimate the effective contact rate, assuming this to be constant over time, but do not estimate HIV 
incidence. Moreover, the authors fit the susceptible-infected model to prevalence data,without explicitly 
modeling HIV prevalence. 

In this paper, on the other hand, we have shown the feasibility of simultaneously estimating HIV preva- 
lence, incidence and contact rates, and transmission probabilities. We have estimated a rise in incidence 
in MSM, due to both undiagnosed and diagnosed infection, with posterior probability that incidence in 
2007 is greater than in 2002 of 95%. We have further estimated that this rise is due to both diagnosed 
and undiagnosed infectious MSM, albeit this finding relies on somewhat simplistic behavioral and mixing 
assumptions. 

The combined and prevalence models we compared were fitted to different sets of data. In trying to 
understand the value of adding further data and potentially further parameters to model the extra data, the 
DIC may not be employed directly, due to the differing amounts of information. We therefore compared 
only the contributions to DIC of those data points that are common to all models under examination. 
While informally this technique allows us to compare relative goodness of fit of different models to the 
specific common subsets of data, it is unclear how to formally evaluate which model is "best." Inevitably, 
plausibility and usefulness in a public health setting must be taken into account. Model choice in this case 
study (Presanis, 2010) has been informally driven by considerations of both credibility of the underly- 
ing migration, contact, and transmission structures and assessment of when differences in DIC become 
sufficiently large to discriminate models. Understanding the DIC's distribution is the subject of ongo- 
ing research (Spiegelhalter and others, 2002; Celeux and others, 2006; Plummer, 2008), in particular 
for complex probabilistic models. This case study suggests the DIC may not be the most useful tool 
for model selection in this context, so that further work is required to explore other model comparison 
methods. 

Our dynamic transmission model based on evidence synthesis is clearly a first step in the HIV appli- 
cation: there are many extensions to be explored. First, work is in progress to elaborate the contact and 
transmission structure (Presanis, 2010), by considering more realistic contact patterns, such as preferential 
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mixing, and stratification by behavioral risk, as results are likely to be sensitive to these model assumptions. 
Additionally, there is a growing body of evidence on antiretroviral treatment and treatment resistance (UK 
Group on Transmitted HIV Drug Resistance, 2005; Brown and others, 2009) that may be used to inform 
probabilities of transmission from diagnosed to susceptible individuals. 

Second, further work is required to model the MSM population realistically. The combined model 
results in different estimated trends for the proportion of men who are MSM than the prevalence model, 
with the combined model suggesting a greater rate of increase in the MSM population than in general. 
Additionally, compared to the prevalence model, the estimated trends in it and S in the incidence models 
are somewhat smoothed (results not shown), due perhaps to the dependence over time in dynamic models 
not allowing for abrupt changes from year to year. An increase in p over time is not implausible (Mercer 
and others, 2004), whereas our assumption that NATSAL provides unbiased information for each year, 
necessitated by lack of recent evidence, is perhaps unrealistic. We have explored various migration as- 
sumptions and have assessed the different models through comparisons of the DIC contributions of data 
common to all models under consideration (results not shown; Presanis, 2010). We have chosen to con- 
centrate on models assuming outward migration occurs from all states, with inward migration occurring 
only into the non-MSM group, as differences in DIC between the different migration models are small. 
Migration and population data for MSM are currently limited but further data are being collected (Jolozo 
and others, 2010; National Centre for Social Research, 2010), giving potential for further work modeling 
the MSM population. 

Third, the models presented here make a number of prior and likelihood assumptions, to which results 
may be sensitive. The Binomial and Poisson likelihoods assumed in both the prevalence and combined 
models do not allow for any overdispersion in the data, and while the deviance summaries indicate no 
lack of fit to the data, so that at first glance, there is little evidence of overdispersion, further sensitivity 
analyses are in progress. Investigation of the contribution of each piece of evidence (data, priors, or model 
assumptions) to the inference is also underway. 

Other possibilities include extending the model to other risk groups, stratifying by heterosexual or 
parenteral transmission, gender, age, region of birth, and region of residence, for example. Transmission 
and contact rates could also be allowed to vary by time since infection. The piecewise constant transition 
rates could instead be allowed to vary smoothly over time, through the use of splines for example. While 
there are many possibilities for developing this work, we have nevertheless demonstrated an important 
step toward the joint modeling of HIV prevalence, incidence, and the transmission mechanism linking the 
2 in a fully inferential framework. 

Supplementary materials 
Supplementary material is available at http://biostatistics.oxfordjournals.org. 
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